Jonathan Tang
Real Estate Market Analysis
Data Mining/Scraping
Python

Real Estate Market Analysis¶

istockphoto-1267389390-612x612.jpg


RE/MAX¶

remax-balloon-logo - Copy.jpg

.txt files ranking¶

We created a Python script which takes text files of property listings from the RE/MAX website and ranks them based on how well they fit our predefined criteria.

We gathered 15 text files from RE/MAX, using Jonathan's hometown Monterey Park, CA as a search example.

chrome_OU21L85V2J.png

carbon (9).png

Above are the criteria that we are searching for in the properties, which can be edited to our clients' liking.

Here's a code snippet from the python script:

carbon (12).png

Using our Python script, we sorted out the top 15 properties in order of how many of our criteria they met.

carbon (6).png

824 S Garfield Ave and 135 W Newmark Ave Apt. A met all 4 criteria, scoring 4 points.


data.ct.gov¶

chrome_vTHG47raeE.png

.json file scraping¶

We wrangled a .json dataset from data.ct.gov containing all public real estate sales in the state of Connecticut from 2001-2020.
The dataset is 286 MB in file size, with over 997,000 listings.

Metadata:

carbon (4).png

Example listing:

carbon (5).png

Using a Python script, we calculated the median sale price, sales ratio, and number of sales in 5-year intervals in Connecticut.

Here's a code snippet from the python script:

carbon (14).png

And here is the output:

carbon (7).png

For full insights and recommendations, see our report at this link:

https://colab.research.google.com/drive/1POQchp4YoYtba_NeKvmjXo5mqAZ9of4P?usp=sharing


Realtor¶

rdc-logo-default.png

.html webpage scraping¶

We wrote a Python script which used BeautifulSoup to scrape the first 10 pages of property listings in Connecticut. We wanted to analyze the current property prices and compare it to the historical prices we found from the data.ct.gov .json dataset.

chrome_0Iwj3Lq2OV.jpg

We stored the scraped data into a .csv file for local access. Here's how some of the data looks like:

In [ ]:
df = pd.read_csv('realtor_connecticut.csv')
df.head(5)
Out[ ]:
Unnamed: 0 Address City Price Bed Bath Sqft
0 0 4 Bobin Rd Plymouth 260000 3 2.5 1400
1 1 6 Joan Dr Enfield 194900 3 2.0 1848
2 2 579 Wauregan Rd Brooklyn 299000 3 1.0 1136
3 3 7 Candle Hill Rd New Fairfield 279900 3 1.0 819
4 4 7 Southeast Trl New Milford 359000 3 1.0 1235

Here is a code snippet from our Python script:

carbon (15).png

Using the data, we calculated the median and mean price of current listings.

carbon (8).png

Here are some plots from the aggregated data:

In [ ]:
bins = list(range(0, 1000001, 100000))
sns.set_style("darkgrid")
sns.histplot(df, x="Price", bins=bins, binrange=[0, 1000000], color="purple").set(
    title="Prices of Connecticut Homes"
)
plt.ticklabel_format(style="plain", axis="x")

download.png

In [ ]:
sns.boxplot(data=df[["Bed", "Bath"]], palette="flare", orient="h", width=0.3).set(
    title="Distribution of Beds and Baths in Connecticut Homes"
)
plt.xlim(0, 7.5)

download (1).png

In [ ]:
city_order = df.groupby("City")["Price"].mean().sort_values(ascending=False).index

sns.barplot(
    data=df, x="City", y="Price", order=city_order[:10], errorbar=None, palette="flare"
)

plt.xlabel("City")
plt.ylabel("Average Price")
plt.title("Most Expensive Cities in Connecticut (Average Price)")
plt.xticks(rotation=45)
plt.ticklabel_format(style="plain", axis="y")

plt.show()

download (2).png

For full insights and recommendations, see our report at this link:

https://colab.research.google.com/drive/1ymcqtzDyNP6t2T0wbVGt1WBO5a0-ae0s?usp=sharing


istockphoto-530210568-612x612.jpg

Team 7
Jonathan Tang